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Abstract. Numerous authors have established a connection between the Com- 
pressed Sensing problem without noise and the estimation of the Gel'fand widths. 
This article shows that this connection is still true in the noisy case. Indeed, we in- 
vestigate the lasso and the Dantzig selector in terms of the distortion of the design. 
This latter measures how far is the intersection between the kernel of the design 
matrix and the unit £i-ball from an I2 -ball. In particular, we exhibit the weakest 
condition to get oracle inequalities in terms of the s-best term approximation. 



1. Introduction 

In the past decade much emphasis has been put on recovering a large number 
of unknown variables from few noisy observations. In particular, we consider the 
high-dimensional linear model where an experimenter observes a vector y E R n 
such that 

y = Xp* + z, 

where X € R" xp is a called the design matrix (known from the experimenter), 
j3* € M. p is an unknown target vector one would like to recover, and z € R™ is a 
stochastic error term that contains all the perturbations of the experiment. Assume 
that one can provide a constant A° € R, as small as possible, such that 



(1) A° > \\X 



T. 



2 



with an overwhelming probability (where X T g W xn denotes the transpose ma- 
trix of X). Observe that it is the only assumption on the noise throughout this pa- 
per. We recall a well-known result in the case where z is a n-multivariate Gaussian 
distribution. 

Lemma 1.1 — Suppose that z = (z.;)?=i is such that the z{s are i.i.d with respect to a 
Gaussian distribution with mean zero and variance o'i . Choose t > 1 and set 



\ n (t) = (l + t)-\\x\\ e2Oo -a n -^gP, 
where \\x\\ denotes the maximum £ 2 -norm of the columns of X. Then, 



P(A°(t) > ||* T HU >l-V2/[(l + t)V£togPP = " 

Motivated by recent issues in modern research areas, suppose that you have far less 
observation variables yi than the unknown variables /?*, the so called n<p setup. 
For instance, let us mention Compressed Sensing [Don06, CRT06] where one would 
like to simultaneously acquire and compress a signal using few (non-adaptive) lin- 
ear measurements. In general terms, we are interested in accurately estimating the 
target vector j3* and/ or the response X/3* from few corrupted observations. Dur- 
ing the past decade, this challenging issue has attracted a lot of attention among 
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the statistical society. A breakthrough was initiated by R. Tibshirani in 1996 when 
he introduced the lasso [Tib96]. It is defined by 



(2) ^ G argmhi{i||^- 2/ |£+A,|| i 3||, i } ) 



£6 

where \e > denotes a tunning parameter. Thirteen years later, this estimator con- 
tinues to play a key role in our understanding of high-dimensional inverse prob- 
lems. Its popularity might be due to the fact that this estimator is computation- 
ally tractable. Indeed, the lasso can be recasted in a Second Order Cone Program 
(SOCP) that can be solved using an interior point method. In the same way, E.J 
Candes and T. Tao [CT07] introduced the Dantzig selector as 

(3) (3 d 6 arg min \\(3\\ s.t. \\X T (y - X(3)\\t^< A d , 

where > is a tunning parameter. It is known that it can be recasted as a linear 
program. Hence, it is also computationally tractable. A great statistical challenge 
is then to find efficiently verifiable conditions on X ensuring that the lasso (2) and 
the Dantzig selector (3) would recover "most of the information" about the target 
vector f3* . 

1.1. An oracle inequality. What do we precisely mean by "most of the informa- 
tion" about the target? What is the amount of information one could recover from 
few observations? That are two of the important questions raised by Compressed 
Sensing. Suppose that you want to find an s-sparse vector (i.e. a vector with at 
most s non-zero coefficients) that represents the target, then you would probably 
want that it contains the s largest (in magnitude) coefficients /?,*. More precisely, 
denote S± C {1, . . . ,p} the set of the indices of the s largest coefficients. The s-best 
term approximation vector is f3g E W p where (/3^ )i = (3* if i 6 S+ and otherwise. 
Observe that it is the s-sparse projection in respect to any £ g -norm for 1 < q < +oo 
(i.e. it minimizes the £ g -distance to (3* among all the s-sparse vectors), and then the 
most natural approximation by an s-sparse vector. 

Suppose that someone gives you all the keys to recover (3g . More precisely, 
imagine that you know the subset S+ a head of time in advance and that you ob- 
serve y ° racle = Xj3% t + z. This is an ideal situation referred as the oracle case. 
Assume that the noise z is a Gaussian white noise of standard deviation a n , i.e. 
z ~ A/" n (0, cr^ Id„) where M n denotes the n-multivariate Gaussian distribution. 
Then the optimal estimator is the ordinary least square f3 ldeat g W on the sub- 
set S+, namely 

pideai g min \\X0- y° racle \\] , 

Supp(£)CS, 

where Supp(/3) C {1, . . . ,p} denotes the support (i.e. the set of the indices of the 
non-zero coefficients) of the vector (3. It holds 

II aideai /3*ll II oideai || , Mo* || ^- fZ\\ nideai o* \\ , ||/Q* || 

\\P ~P\\iy-\\P ~PsJ\ tl + \\Pss\\ ei S Vs\\P -PsAi 2 + WPsih^ 
where /3% c — (3* — denotes the fi-error of the s-best term approximation. An 
easy calculation shows that 

E p ldeae _ pk |£ = Tracc (( X T . al > (I)" • ^ • s, 

where Xg, £l" xs denotes the matrix composed by the columns X; g K" of the 
matrix X such that i € S±, and p\ is the largest eigenvalue of X. It yields that 



1/2 1 

> <?n 

Pi 
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In a nutshell, the ^i-distance between the target f3* and the optimal estimator p ideat 
can be reasonably said of the order of 

(4) — -Vn-S+WfeAl. 

In this article, we say that the lasso satisfies a variable selection oracle inequality of 
order s if and only if its ^i-distance to the target, namely — /3*|| £ , is bounded 
by (4) up to a "satisfactory" multiplicative factor. 

In some situations it could be interesting to have a good approximation of X(3*. 
In the oracle case, we have 

\\X0 ideat -Xp*\\ ia < \\xp ideat - Xp* s J\\ t + \\X0%c\\ e , 

< \\xp ideai -xpi\\ i2 + Pl \\p* ss \\ h . 

where p\ denotes the largest singular value of X. An easy calculation gives that 

E || X/3 «W _ x ^ |£ = Trace ( x ^ (x^XsJ-^JJ • al =o*-s. 
Hence a tolerable upper bound is given by 

(5) <r n -Vs + pi\\Psc\\ ei . 

We say that the lasso satisfies an error prediction oracle inequality of order s if and only 
if its prediction error is upper bounded by (5) up to a "satisfactory" multiplicative 
factor (say logarithmic in p). 

1.2. The Universal Distortion Property (UDP). This article investigates a new suf- 
ficient condition to prove oracle inequalities for the lasso. We introduce the Uni- 
versal Distortion Property UDP (So, Kq, A) as follows. 

Definition 1 (UDP(So, kq, A)) — A matrix X e R nxp satisfies the universal distor- 
tion condition of order So, magnitude n and parameter A if and only if 

• 1 < So < p, 

• < K Q < 1/2, 

• and for all x € W, for all integers s G {1, . . . , So}, for all subsets 5C {l,...,p} 
such that \S\ — s, it holds 

(6) lk<s|U - ^v^s ||^ X IL 2 + K o|HU- 

This property is similar to the Compatibility Condition of van de Geer and Biill- 
mann [vdGB09] although it is weaker (see Section 2 for a comparison with the usual 
conditions). As a matter of fact, every matrix satisfies the UDP condition with ex- 
plicit parameters in terms of the geometry (e.g. the distortion) of its kernel, cf. 
Lemma (1.2). 

1.2.1. The distortion. We recall the definition of the distortion. 

Definition 2 — A subspace T <zW has a distortion 1 < 5 < ^fp if and only if 

Vx e r, \ x \ lx < VpIHI^ — ^IHLj- 

A long standing issue in approximation theory in Banach spaces is finding "almost- 
Euclidean" sections of the unit ^i-ball, i.e. subspaces with a distortion close to 1 
and a dimension close to p. In particular, we recall that it has been established 
[Kas77] that there exists subspaces of dimension p — n such that 



(7) 6 < C 



p(l + \og(p/n))\ 1/2 
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where C > is an universal constant. In other words, it was shown that, for all 
n < p, there exists a subspace Y n of dimension p — n such that, for all x E T n , 

w <c f+w>y /2 Hi . 

ii nf 2 - y n y ii ii^i 

We discuss recent deterministic constructions of almost sections of the £i-ball in 
Section 3. 

1.2.2. An universal property. Since that it is satisfied by all the full column rank ma- 
trices and that the parameters So and A can be expressed in terms of the distor- 
tion, we name the property "Universal Distortion". Indeed, we show the following 
lemma. 

Lemma 1.2 — Let X e R nxp be a full column rank matrix. Denote S the distor- 
tion of its kernel and p n its smallest singular value. Let < kq < 1/2 then X satisfies 
UDP(S , o,k , A) where 

(8) and A=^. 

This lemma is sharp in the following sense. The parameter Sq represents (see Theo- 
rem 1.3) the maximum number of coefficients that can be recovered using lasso, we 
call it the sparsity level. It is known [CDD09] that the best bound one could expect 
is 

S op t ~ n/log(p/n), 

up to a multiplicative constant. In the case where (7) holds, the sparsity level sat- 
isfies 

(9) So sa Kq S op t- 

It shows that any design matrix with low distortion satisfies the UDP condition 
with an optimal sparsity level. 

1.3. Results. The results presented here fold into two parts. In the first part we 
assume only that UDP holds. In particular, it is not exclude that one can get better 
upper bounds on the parameters than Lemma 1.2. As a matter of fact, the smaller 
A is the sharper the oracle inequalities are. In the second part, we give oracle in- 
equalities in terms of only the distortion of the design. 

Theorem 1.3 — Let X GR nxp be a full column rank matrix. Assume that X satisfies 
UDP (So, Ko, A) and that (1) holds. Then for any 

(10) Xt > A°/(l — 2«o), 
it holds 

(11) .<- min (A £ A 2 S +||/3* C || . 

" " £l (1-4"-) -2«o ^{i,..., Ph V 11 

V x i ' \S\=s, s<S . 

Invoking Lemma 1.2, the following holds: For every full column rank matrix X G R nxp , 
for all < k < 1/2 and \t satisfying (10), we have 



(12) 11^-0*11 < min Uf^ir-s + \\p* s A\ e 

V X '-' \S\=s, 

S<(K /S) 2 p. 

where p n denotes the smallest eigenvalue of X and 8 the distortion of its kernel. 
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■v- Consider the case where the noise satisfies the hypothesis of Lemma 1.1 and take 
A° = A° (1). Assume that k is constant (say k = 1/3) and take Xe = 3A° then (11) 
becomes 

|S|=8,'«<So'- 

which is an oracle inequality up to a multiplicative factor A 2 ^/logp. In the same 
way (12) becomes 



L9*-/H, <12 min ( 2 4 ||X|L . U^. . L a „ s + U %a I 

|S|=8, 

s<p/9<5 2 . 



which is an oracle inequality up to a multiplicative factor C mu n := (5 2 ^logp) / p Tl 
■<f In the optimal case (7), this latter becomes: 



nr . n n p(l + log(p/n))x/iog^ 

where C > is the same universal constant as in (7). Roughly speaking, up to 
a factor of the order of (13), the lasso is as good as the oracle that knows the So- 
best term approximation of the target. Moreover, as mentioned in (9), So is an 
optimal sparsity level. However, this multiplicative constant takes small values for 
a restrictive range of the parameter n. As a matter of fact, it is meaningful when n 
is a constant fraction of p. 

Similarly, we shows oracle inequalities in error prediction in terms of the distor- 
tion of the kernel of the design. 

Theorem 1.4 — Let I € l" xp be a full column rank matrix. Assume that X satisfies 
UDP(So, Koi A) an d that (1) holds. Then for any 

(10) A, > A°/(l - 2«o), 

it holds 



(14) XB Z -XB* L < min 

11 5C{l,...,p}, 
s, s<Sq. 



4A f AVs - 



A^/S 



Invoking Lemma 1.2, the following holds: For every full column rank matrix X G R nxp , 
for all < K < 1/2 and \t satisfying (10), we have 



(15) \\XB t -XB k \\, < min 

11 ll<2 5C{l,.,p}, 

\S\=s, 
s<(n /S) 2 p. 



2 $ 1 
4A £ ^fs + ——= ■ Pn \\B%< 



where p n denotes the smallest eigenvalue of X and 8 the distortion of its kernel. 

■0- Consider the case where the noise satisfies the hypothesis of Lemma 1.1 and take 
An = A n (l). Assume that k is constant (say n = 1/3) and take \i = 3A° then (14) 
becomes 



\XB e -XB*\\. < min 

1 ll£2 " SC{l,...,p}, 

\S\=B, S<S - 



2i\\X\\ boo -Ay/logp-a n y/i- 



which is not an oracle inequality stricto sensu because of l/(A\/i) in the second 
term. As a matter of fact, it tends to lower the s-best term approximation term 
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|^5 c ||f • Nevertheless, it is "almost" an oracle inequality up to a multiplicative 
factor of the order of A^Iogp- m t ne same way, (15) becomes 



\X8 e -X8* L < min 

1 1112 <SC{l....,p}, 

\S\ = s, 
s<p/9S 2 . 



which is an oracle inequality up to a multiplicative factor C' mult := (8^/logp)/ p n . 
In the optimal case (7), this latter becomes: 

nf s r t _ n (plQgp(l+l0g(p/"))) 1/2 

(16) Cmult ~ c ' 

where C > is the same universal constant as in (7). 

1.4. Results for the Dantzig selector. Similarly, we derive the same results for the 
Dantzig selector. The only difference is that the parameter kq must be less than 
1/4. Here again the results folds into two parts. In the first one, we only assume 
that UDP holds. In the second, we invoke Lemma 1.2 to derive results in terms of 
the distortion of the design. 

Theorem 1.5 — Let X e W ixp be a full column rank matrix. Assume that X satisfies 
UDP(S'o, k , A) zvith kq < 1/4 and that (1) holds. Then for any 

(17) \ d > A°/(l - 4k„), 
it holds 

(18) l^-^ (1 -^)-4*o - { T n , P} , (a^+ii/^). 

v Xd> |S|=s, s<S . 

Invoking Lemma 1.2, the following holds: For every full column rank matrix X g R nxp , 
for all < k < 1/4 and Ad satisfying (17), we have 

v Ad ' |S|=s, 

s<(k /<5) 2 p. 

if/iere p„ denotes the smallest eigenvalue of X and 8 the distortion of its kernel. 

The prediction error is given by the following theorem. 

Theorem 1.6 — Let X G R nxp be a full column rank matrix. Assume that X satisfies 
UDP(So, Ko, A) with k < 1/4 and that (1) holds. Then for any 

(17) \ d > A°/(1-4k ), 

10*4 



(20) X^-X^ L < min 

" ll£2 ~ SC{l,...,p}, 

\S\— S, S<Sq. 



4A d Av^ - 



A^i 



Invoking Lemma 1.2, the following holds: For every fidl column rank matrix X e 
for all < k < 1/4 and A^ satisfying (10), roe faace 



2 8 1 
4A d — -= ■ p n 

pn Idy/S 



(21) \\XB d -XB*\\ c < min 

" Ui2 5C{l,...,p}, 

|5|=s, 
s<{ Ko /S) 2 p. 

where p„ denotes the smallest eigenvalue of X and 8 the distortion of its kernel. 
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Observe that the same comments as in the lasso case (e.g. (13), (16)) hold. Even- 
tually, every result in constructing deterministic almost-Euclidean sections gives 
design that satisfies the oracle inequalities above. 

1.5. Organization of the paper. The paper folds into three parts. The next section 
shows that UDP is the weakest condition on the lasso to have oracle inequalities. In 
particular we show that RIP implies UDP. Section 3 recalls the best known results 
for constructing subspaces with low-distortion. The last section is devoted to the 
proofs of the results. 

2. An overview of the standards conditions 

Oracle inequalities for the lasso have been established under a variety of differ- 
ent conditions on the design. An remarkable overview can be found in the article 
of van de Geer and Bullmann. We recall some sufficient conditions here. For all 
s E {1, . . . ,p}, we denote by S s C M. p the set of all the s-sparse vectors. 

♦ Restricted Isoperimetric Property: A matrix X e R nxp satisfies RIP (6s) 
if and only if there exists < 9s < 1 (as small as possible) such that for all 
s e {1, . . . , S}, for all V7 € S s , it holds 

(1-^)||7|| £ 2 2 <||^7||' 2 <(1 + ^)||7|L 2 2 - 

The constant 9 S is called the S-restricted isometry constant. 

♦ Restricted Eigenvalue Assumption [BRT09]: A matrix X € R nxp satisfies 
RE(S, c ) if and only if 

Klb.Co) = mm mm — — > (J . 

sc{i,..., P } 7/0 hs\\e 2 

\S\<S \hs<=\\i 1 <co\hs\\e 1 

The constant k(S, cq) is called the (S, co)-restricted ^-eigenvalue. 

♦ Compatibility Condition [vdGB09]: A matrix X e R nxp satisfies the con- 
dition Compatibility (S, cq) if and only if 



</>(£>, Co) = mm mm — — > . 

sc{i,..., P } 7 #o ||7s|k 

\S\<S ||7s<=IUi<Co||7slUi 

The constant 4>(S, Co) is called the (S, cq) -restricted /?i-eigenvalue. 
♦ H Si i Condition [JN10]: X e R nxp satisfies the H S)1 (k) condition (with k < 
1 /2) if and only if for all 7 G W and for all S C {1, . . . , p} such that \S\ < S, 
it holds 

(22) Wlsl^XSWXjW^+nWjWe,, 

where A denotes the maximum of the £2 -norms of the columns in X. 

Remark. This latter condition is weaker than the UDP condition nevertheless the 
authors [JN10] established limits of performance on their conditions: the condition 
H s , oo(l/3) (that implies H s .i(l/3)) is feasible only in a severe restricted range of 
the sparsity parameter s. Notice that this is not the case of the UDP condition, the 
equality (9) shows that it is feasible for a large range of the sparsity parameter s 
(indeed an optimal range, cf. (9)). 

Let us emphasize that the above description is not meant to be exhaustive. In partic- 
ular we do not mention the irrepresentable condition [ZY06] which ensures exact 
recovery of the support. The next proposition shows that the UDP condition is 
weaker than RIP, RE and Compatibility condition. 
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Proposition 2.1 — Let X g K nxp be a full column rank matrix, then the following is 
true: 

♦ The RIP(#5s) condition with 9 5 s < V% — 1 implies UDP(5, k , A) for all pairs 
(«o, A) such that 



(23) 



1 + 2 



1-0 



as 



75S 



< Ko < — , and A > 



\/l-^5S + ^-^\/l + ^5S 

2k 



4- The RE (S,cq) condition implies UDP(5, c , k(S,cq) 1 ). 

♦ The Compatibility (S, cq) condition implies UDP(S', co, 4>(S, co) _1 ). 

Proof. It is obvious that RE(S,c ) condition implies UDP(5, c , k(S, Co) -1 ), and 
that Compatibility's, Co) condition implies UDP(5', Co, (^(S 1 , Co) -1 ). 

Assume that X satisfies RIP (0 5S ). Let 7 € M p , s e {1, . . . , S }, and T C 
{1, . . . ,p} such that |To| = s. Choose a pair (k , A) as in (23). 
<Mf UttoII^ < KoIMI^ then ||7 To ||^ < AVs||X7||^ + KoIItII^- 
4- Suppose that ||7t ||^ > K o||7|| fl then 



(24) 



I II ^ 1 _ K ° II || 



Denote T\ the set of the indices of the 4,s largest coefficients (in absolute value) in 
To, denote T 2 the set of the indices of the 4,s largest coefficients in (T U 7\) c , etc... 
Hence we decompose T§ into disjoints sets 

Tq = Ti U T 2 U . . . U T; . 

Using (24), it yields 

(25) EIML < (^-r^EII'mlU = ( 4s r 1/2 II^IL < ^II^IU 

i>2 i>l ^0V« 

Using RIP(0 5S ) and (25), it follows that 

||^7|| 4 > ||-^(T(t uTi)) ||^ a - Yj \\ X ^ t Mi 3 > 



i>2 

1 — Kq \\lT \\ ei 



> 



s/l- 


#5S 












#55 



?5S 



2k 



1 + 2 



■1-OhS 



75s 



2k 

IIttqI 



1 



K 



1 + 2 



1 



1 + 05S 



-In 



7T t 



The lower bound on Ko shows that the right hand side is positive. Observe that we 
took A such that this latter is exactly ||7r || f /(A-y/s). Eventually, we get 

||7To||^ < A V / s||^7|| f2 < A Vs||^7|| i , 2 + K o||7|| £l • 
This ends the proofs. □ 

The UDP condition is the weakest condition among all the condition on the lasso. 
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3. The distortion of the design 

One of the big issue in modern statistics is to find verifiable conditions. This 
question is valuable to the statistics community since one knows that the RIP con- 
dition (which is the key stone of Compressed Sensing) cannot be computationally 
checked for a given matrix. To overcome this difficulty, at the price of weaker re- 
sults, we investigate the role of the distortion in high-dimensional regression. It is 
known that there is a connection between the Compressed Sensing problem and 
the problem of estimating the distortion. This framework was studied by numer- 
ous authors [CDD09, KT07, DeV07] and might interest both people working on 
building deterministic almost Euclidean section of the i'l-ball and those looking 
for deterministic design for Compressed Sensing. Table 1 presents some impor- 
tant results for constructing almost-Euclidean sections of the £i-ball. The last line 
of Table 1 deals with the optimal case derived from a probabilistic construction. 
Even though this construction has been established in the late '70 there is no deter- 
ministic proof of it. 



Reference 


Distortion 


Co-dimension 


Randomness 


[Ind07] 
[GLR08] 


1 + e 
log(p) 0r, ( los log logp ) 


p _ p l-o c {l) 

VP 


Explicit 
Explicit 


[IS10] 
[Kas77] 


1 + e 

C^l + log^/n))/^ 1 ^ 


(1 - (>ye) u W)p 
n 


0(f) 
np 



Table i. The best known results for constructing almost Euclidean 
subspaces. The parameters e, r), 7 e (0, 1) are assumed to be con- 
stants, although the dependence on them is subsumed by the big- 
Oh notation. The parameter C > denotes an universal constant. 



Most of the explicit constructions can be viewed as related to the context of error- 
correcting codes. Indeed, the construction of [Ind07] is based on amplifying the 
minimum distance of a code using expanders. While the construction of [GLR08] 
is based on Low-Density Parity Check (LDPC) codes. Lastly, the construction of 
[IS10] is related to the tensor product of error-correcting codes. The main reason 
of this surprising fact is that the vectors of a subspace of low distortion must be 
"well-spread", i.e. a small subset of tis coordinates cannot contain most of its £2- 
norm (cf [Ind07, GLR08]). This property is required from a good error-correcting 
code, where the weight (i.e. the ^o _n orm) of each codeword cannot be concentrated 
on a small subset of its coordinates. Similarly, this property was intensively studied 
in Compressed Sensing, see for instance the Nullspace Property in [CDD09]. 

4. Proofs 

Proof of Lemma 1.1 — Observe that X T z ~ Af p (0, a 2 n X T X). Hence, 

Vj = l,...,p, Xjz -M&alWXjWl). 
Using Sidak's inequality [S68], it yields 

P(||X T z||^< A°) > P(||z|| £oo < A°) = f[F(\%\ < A°) , 

1=1 

where the z/s are i.i.d. with respect to A/"(0, a\ ||X ||^ ). Denote <I> and ip respec- 
tively the cumulative distribution function and the probability density function of 
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the standard normal. Set 9 = (1 + t)y/logp. It holds 

f[P m < A°) - P (N < A°) p = (2#(0) - 1)" > (1 - 2^(0)/< , 

i=l 

using an integration by parts to get 1 — $(0) < y>(0) / 0. It yields that 

P(||* T 4, < A °) > (l-2^)/< > l"2p* = 1- 



(1 + t)^/T\ogpp— 1 
This concludes the proof. □ 

Proof of Lemma 1.2 — Consider the following singular value decomposition X = 

U T DA where 

U e M" x " is such that UU T = Id„, 
-v> Z? = Diag( j oi, . . . , p„) is a diagonal matrix where p\ > ■ ■ ■ > p n > are the 
singular values of X, 
and A e R nx 'P is such that AA T = Id„. 

We recall that the only assumption on the design is that it has full column rank 
which yields that p n > 0. Let 5 be the distortion of the kernel L of the design. 
Denote by nr (resp. 7r r j_) the ^-projection onto T (resp. T^). Let 7 e R p , then 
7 = 7rr(7) + Tr- 1 - (7)- An easy calculation shows that 

7T r x(7) = A T Aj. 

Let s e {1, . . . , S} and let 5 C {1, ... ,p} be such that \S\ = s. It holds, 

Mb < v%||, 2 , 

= V« ||7rr(7)|| £2 +-^11^(7)11^, 
11^(7)11^ +v^||^7|U, 

<^*(IHk + || (vr r 47))|U) +^IH|, 2 , 
< &6 IHI^ + (1 + *)^||A 7 || |>) 

using the triangular inequality and the distortion of the kernel T. Eventually set 

n a = (y/S/y/p) 6 and A = 28/p n . This ends the proof. □ 

Proof of Theorem 1.3 — We recall that A° denotes an upper bound on the ampli- 
fication of the noise, see (1). We begin with a standard result. 

Lemma 4.1 — Let h = /3 e - /?* e MP and A £ > A° . Then, for all subsets S C 
{!,... ,p}, it holds, 



( 26 ) ^Mll + ( A *- A °)IN 
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Proof. By optimality, we have 
It yields 



l -\\X(3" - y\\\ + \t\n ti < l\\X0* - y\\l + X^*^. 



^\\Xh\\: -(X T z,h) + X e \\/3% i <\e\\P*\\ ei . 



II 2 

2 

Let S C {1, . . . , p}, we have 

l\\xh\\i + xtw&w^ < mii&il - \\Psij + m&\\ tl + {x t z^), 

using (1). Adding A^||/3J C || f on both sides, it holds 

\\\Xhf h + (A, - X° n )\\h s .\\ ei < (A, + A^II^H^ + 2A,||^ ||, i . 
Adding [Xi — A° ) 1 1 hs || e on both sides, we conclude the proof. □ 
Using (6) and (26), it follows that 



1 



1 



(27) ^[^WXhWl + iXt-XDWhW^ <AV~s\\Xh\\ e2 + Ko\\h\\ ei + \\p* s 
It yields, 



\0 

-li-^M -«o 



(- Jx e W x < + A ^ s W Xh L) + \\fi>\k> 

<Xi A 2 a + \\^\\^ 

using the fact that the polynomial x n- — (l/4Af) x 2 + Ay/sx is not greater than 
Xe A 2 s. This concludes the proof. □ 

Proof of Theorem 1.5 — We begin with a standard result. 

Lemma 4.2 — Let h = (3 e - (3* g M p and X e > A°. Then, for all subsets S C 
{!,... zf JjoWs, 



1 



(28) -f- + (A, -A°)||/i|L < \\hs\\ f +m. 



e 2 y a n n\ Ui! 



4A d 

Proof. Set h = (3* - /3 d . Recall that ||X T z|| £oc < A°, it yields 

\\Xh\\l < \\X T Xh\\ e J\h\\ ei 

= \\X T (y - Xp d ) + X T (XP* - y)\\ lao \\h\\^ 

<{x d + x° n )\\h\\ ti . 

Hence we get 

(29) IIX^V^ + A^II^IL <(^ + X° n )\\h s \\ ei . 

Since /?* is feasible, it yields ||/3 rf || fi < • Thus, 

||/&lk< {W%\W-Ws\W) + \\PsAW< \\h s \W+W%AW ■ 

Since ||/V||^ < \\Ps°\\ tl + \\Ps° \\ tl > & y ields 

(3°) IML< II + \kk s 2 ¥Uk ■ 

Combining (29) + 2A d • (30), we get 

\\Xh\\l+(X d -X^\\h s 4 ei < (3A d + A°)||/i s ||, i +4A d ||^ c ||, i . 
Adding (Ad — A° ) || hs ||^ on both sides, we conclude the proof. □ 



12 



YOHANN DE CASTRO 



Using (6) and (28), it follows that 
(31) J-[\\Xh\\l + (\t-\° n )\\h\\ £i ] < AyTs\\Xh\\ e2 +Ko\\h\\ £i + \\p* s , 
It yields, 



7(l-^)-«0 



< (- -^-\\Xh\\ 2 , +Ay/a\\Xh\ 

< A, A 2 s +||/3* II. , 



0S< 



using the fact that the polynomial x M> — (1/AXi) x 2 + Ay/sx is not greater than 
\i A 2 s. This concludes the proof. 

Proof of Theorem 1.4 and Theorem 1.6 — Using (27), we know that 



□ 



2A, 



\Xh\\. + (Xi - X° n )\\hL < A^\\Xh\\. + Ko\\h\\. + \\f3U 



It follows that 

\\Xh\\l 2 -4X e A^ || Xh\\ i2 < iX t 11^.11^ . 
This latter is of the form x 2 — bx < c which implies that x < b + c/b. Hence, 



\\Xh\\. <4X e Ayf^ + 

II ll£ 2 

The same analysis holds for Theorem 1.6. 



A^ 



□ 
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